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LONG-TERM  GOALS 

The  primary  goals  of  this  research  project  are  to  systematically  assess  the  performance  and  refine  the 
numerical  algorithms,  physical  parameterizations,  and  computational  strategies  used  in  Regional  Ocean 
Modeling  System  (ROMS).  ROMS  is  a  relatively  new  oceanic  model,  with  parallel  and 
loosely-coordinated  developments  between  Dr.  Hernan  Arango  (Rutgers)  and  us.  It  has  established  itself 
as  a  viable  community  model  with  a  growing  number  of  research  users. 

OBJECTIVES 

The  objectives  of  this  project  are  the  following:  (1)  consolidation  of  the  computational  kernel  of  ROMS 
with  respect  to  pressure-gradient  force,  equation  of  state,  embedded  grids,  time- stepping,  vertical-mode 
coupling,  and  biological  modeling;  (2)  further  development  of  parallelization  capabilities,  including  a 
hybrid  combination  of  message-passing  and  shared  memory  methods,  plus  migration  of  ROMS  to  the 
IBM  SP  computer  and  implementation  of  generalized  (unstructured)  message-passing;  (3)  application  of 
polynomial  reconstruction  schemes  to  advection,  dissipation,  and  mixing  algorithms;  (4)  improvements 
of  the  K-Profile  Parameterization  (KPP)  for  surface  and  bottom  boundary  layer  mixing;  and  (5) 
development  of  a  dynamically  adaptive  vertical  grid  as  a  generalization  of  the  present  sigma-coordinate 
grid,  to  combine  the  distinctive  advantages  of  height,  density,  and  terrain-following  coordinates. 

APPROACH 

The  primary  design  goal  for  ROMS  is  to  produce  limited-area,  high-resolution,  realistic  coastal 
simulations  in  an  efficient  manner  on  parallel  computers.  The  technical  approach  is  computational 
simulation  of  oceanic  fields  for  velocity,  temperature,  and  salinity;  chemical  concentrations  of  nutrients, 
O2,  CO2,  etc.;  planktonic  populations;  and  mobile  sediments.  ROMS  is  based  on  the  hydrostatic 
Primitive  Equations  in  terrain-following  curvilinear  coordinates  with  a  free  upper  surface.  It  contains  a 
variety  of  innovative  algorithms,  including  an  advection  operator  designed  to  reduce  dispersive  errors 
and,  consequently,  excessive  dissipation  rates,  thereby  effectively  boosting  the  resolution  on  a  given  grid 
(Shchepetkin  &  McWilliams,  1998).  The  boundary-value  problems  that  are  our  focus  are  for  various 
regional  domains  along  the  North  American  West  Coast  ( e.g Marchesiello  et  al.,,  2002)  with  specified 
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surface  forcing  fields  and  boundary  data,  including  the  output  from  a  whole-Pacific  ROMS  configuration. 
The  boundary  data  are  imposed  by  adaptive  open  boundary  conditions  (Marchesiello  el  al.,,  2001).  We 
have  developed  and  implemented  a  hierarchical  embedding  capability  for  the  local,  fine-resolution  grid  in 
a  sub-domain  within  the  coarse-resolution  grid  spanning  the  entire  domain  (Penven  el  al.,,  2002).  Key 
researchers  at  UCLA  on  this  project  are  Meinte  Blaas,  Xavier  Capet,  Patrick  Marchesiello,  James 
McWilliams,  Pierrick  Penven,  and  Alexander  Shchepetkin,  as  well  as  Nicholas  Gruber,  Hartmut  Frenzel, 
and  Keith  Stolzenbach  for  biogeochemical  and  sedimentary  issues. 

WORK  COMPLETED 

During  the  past  year  we  have  made  good  progress  on  objectives  (1),  (2),  and  (4) — see  Results  below.  For 
objective  (3)  we  expect  to  write  a  paper  soon  on  lateral  mixing  in  general  curvilinear  coordinates.  Work 
on  objective  (5)  is  deferred. 

RESULTS 

Sigma-coordinate  pressure  gradient  problem:  We  completed  this  development,  including  a  new 
treatment  of  the  seawater  Equation  of  State  (EOS),  test  problems  under  realistic  simulation  conditions, 
and  a  proof  of  discrete  energetic  consistency  to  the  case  of  higher  than  second-order  accuracy.  In  our  new 
treatment  of  the  compressibility  effect  in  the  seawater  EOS,  we  abandon  computation  of  in  situ  density  in 
favor  of  density  gradients  formulated  in  terms  of  adiabatic  differences,  which  allows  us  to  bring  the 
mathematical  criterion  of  grid-scale  smoothness  of  the  discrete  field  (needed  to  avoid  spurious 
oscillations  of  polynomial  interpolants)  into  the  context  of  positive-definiteness  of  density  stratification. 
This  work  is  now  completed  (Shchepetkin  &  McWilliams,  2002a).  An  accompanying  intercomparison 

Figure  1 :  Volume-averaged  kinetic  energy  of  spurious  flow 
generated  by  pressure-gradient  error  in  a  flat-stratification 
seamount  test  problem  as  function  of  time  (in  days)  for 
five  different  pressure-gradient  schemes.  POM — density- 
Jacobian  from  Princeton  Ocean  Model;  Lin  97 — finite- 
volume  method  of  Lin  (1997);  y=0.5 — weighted  Jacobian 
of  Song  (1998)  with  weighting  coefficient  decreased  by  a 
factor  of  2  ( i.e .,  half-and-half  average  of  POM  and  Song 
(1998)  Jacobians,  which  is  the  optimal  weighting  in  terms 
of  error  among  all  possible  linear  combinations  of  these  Ja¬ 
cobians;  Cubic  A — a  fourth-order  accurate  density  Jacobian 
scheme  using  algebraic  averaging  of  elementary  differences 
to  estimate  density  derivatives  at  density  points;  Cubic  H — 
same  as  Cubic  A,  but  with  harmonic  averaging  to  constrain 
monotonicity  of  cubic  polynomial  interpolants  for  density 
field. 

for  a  wider  range  of  discretization  methods  has  also  been  published,  Ezer  et  al.,  (2002). 

Time-stepping  and  time-splitting  algorithms:  At  present  the  UCLA  ROMS  model  employs  a 
predictor-corrector  time-step  for  both  barotropic  and  baroclinic  modes,  with  forward-backward  feedback 
between  the  free- surface/tracer  and  momentum  equation  at  every  stage  to  increase  the  stability  limits  in 
the  time-step  size  associated  with  internal  waves.  A  forward  step  is  used  for  lateral  viscosity  and 
diffusion  and  a  backward  step  for  vertical  diffusion.  In  both  cases  viscous/diffusive  terms  are  computed 
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only  once  per  time  step  to  save  the  computational  costs  of  the  rotation  of  horizontal  mixing  terms 
(necessary  due  to  sigma  coordinates)  and  the  vertical  mixing  parameterization.  A  new  version  of  the 
barotropic  mode  using  a  generalized  forward-backward  time  step  is  now  available  and  has  been  shown  to 
be  more  efficient  than  predictor-corrector  in  terms  of  stability  limit  vs.  computational  cost.  Also,  the 
barotropic -baroclinic  mode  coupling  can  now  occur  during  either  predictor  or  corrector  stages  of  the 
baroclinic  time  step  (both  versions  are  available).  The  predictor-coupled  mode  algorithm  uses 
forward-in-time  extrapolation  of  the  mode-coupling  terms  (vertically  integrated  r.h.s.  of  3D  momentum 
equations  minus  r.h.s.  computed  from  barotropic  variables),  in  a  manner  similar  to  an  Adams-Bashforth 
time  step,  before  the  coupling  terms  are  applied  as  forcing  at  every  step  of  barotropic  mode.  No  such 
extrapolation  is  required  for  the  corrector-coupled  version,  because,  after  the  predictor  stage  is  complete, 
the  newly  computed  3D  variables  are  already  time-centered  half-way  between  the  n  and  n  +  1  time 
levels.  This  also  makes  it  possible  to  retain  only  terms  critical  for  the  computational  stability  of  internal 
waves  and  advection  during  the  predictor  stage  for  the  3D  momentum  equations.  This  mitigates 
computational  cost.  This  analysis  is  nearly  finalized,  and  an  associated  manuscript  is  being  prepared 
(Shchepetkin  &  McWilliams,  2002b). 

Parallel  code  design  and  portability  issues:  As  stated  in  the  proposal,  UCLA  ROMS  is  committed  to 
supporting  both  shared-memory  and  MPI  implementations  within  the  same  source  code.  However,  for 
performance  reasons  our  implementation  of  the  shared-memory  capability  was  bound  to  the  SGI  Origin 
2000  that  historically  has  been  our  primary  production  environment.  After  the  release  of  Open  MP  2.0 
Fortran  Specification  standard,  http  :  //www .  openmp  .  org,  and  the  subsequent  appearance  of 
compilers  supporting  the  new  standard,  it  became  possible  to  create  a  portable  code  without 
compromising  its  performance.  Correspondingly,  our  shared-memory  implementation  of  ROMS  was 
redesigned  to  comply  with  Open  MP  2.0  standard.  This  direction  of  work  is  in  mature  stage  at  this 
moment,  with  our  embedded-gridding  capability  now  being  converted.  Besides  the  SGI  Origin  2000,  the 
new  shared-memory  implementation  was  tested  and  proven  to  work  ”out-of-box”  on  platforms  like  IBM 
and  Intel.  The  last  opportunity  is  especially  intriguing  due  to  recent  rapid  progress  on  low-cost, 
commodity  hardware,  which  in  fact  may  outperform  high-end  supercomputers  in  terms  of  processing 
power  per  CPU,  provided  that  codes  are  properly  optimized  to  take  into  account  cache  effects,  Fig.  2. 

The  use  of  commodity  PC  hardware  is  often  associated  with  Linux  clusters  running  MPI-parallelized 
codes  that  mostly  conform  to  one  subdomain  -  one  processor  strategy.  Fig.  2  shows  in  addition  that  one 
may  take  advantage  of  multiple  subdomains  per  processor  option  of  ROMS  and  significantly  (by  a  factor 
of  2.5)  improve  the  utilization  of  hardware. 

Transition  to  Fortran  90:  This  effort  is  primarily  motivated  by  our  embedded-gridding  project  (where 
the  use  of  Fortran  90  features  is  indispensable)  and  is  closely  related  to  the  prototype  code  development, 
as  well  as  reflecting  recent  trends  in  compiler  technology.  It  is  also  our  desire  to  keep  the  embedded  and 
non-embedded  codes  as  close  as  possible  so  that  implementation  of  algorithms  can  be  quickly  transfered 
from  one  to  the  other.  Decision  making  associated  with  this  change  is  rather  complex  since  in  present 
F90  compilers  many  features  actually  impede  computational  performance.  Therefore,  we  assume  a 
balanced  approach  that  allows  a  smooth  transition  to  F90  whenever  it  is  advantageous  ( e.g .,  an 
irreversible  transition  to  F90-only  environment  would  impede  the  use  of  ROMS  on  Linux  computers, 
thus  cutting  off  a  relatively  wide  and  growing  portion  of  user  community).  During  the  last  year  we 
developed  a  series  of  automatic  tools  to  identify  and  convert  types  of  variables  and  constants  (automatic 
promotion  to  double  precision,  if  so  desired),  to  convert  F77  common  blocks  to  F90  modules,  etc.  These 
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Figure  2:  Computational  performance  of  ROMS  code  as  a  function  of  subdomain  partitioning  (blocking) 
policy  on  different  hardware  platforms  for  3/4  degree  Atlantic  Ocean  test  configuration.  Grid  resolution 
is  128  x  128  x  20,  resulting  in  a  100  MB  memory  storage.  Horizontal  axis — number  of  subdomains  (two- 
dimensional  arrangement  is  written  in  each  column  in  IV A"  x  NY  format);  vertical  axis — computational 
speed  expressed  in  model  time  steps  per  minute  of  wall  clock  time.  In  all  cases  parallelization  is  done 
via  OpenMP  and  no  adjustments  to  the  code  are  made  other  than  choosing  different  number  of  subdo¬ 
mains.  Intel  933  MHz  is  a  dual-processor  Pentium  III  SMP  machine  running  Linux  and  using  Intel  IFC 
6.0  FORTRAN  90/95  compiler.  Strong  dependency  of  computation  performance  from  number  of  subdo¬ 
mains  for  Intel  platform  is  explained  by  cache  effects  due  to  combination  of  small  cache,  fast  processors 
and  limitation  by  memory  bandwidth,  which  is  by  far  the  dominant  factor  for  optimization  strategy  in 
this  case.  For  all  other  platforms  the  effect  is  less  significant,  and,  in  fact  the  most  significant  influence 
on  performance  can  be  traced  to  the  side  effects  due  to  shortening  of  innermost  loops  when  decreasing 
subdomain  size.  Nevertheless,  for  a  properly  optimized  code  (number  of  subdomains  is  chosen  to  make 
subdomains  sufficiently  small  to  fit  into  cache),  even  the  previous  generation  of  Intel  platforms  tends  to 
outperform  the  other  computers  presented  here  in  terms  of  processing  power  per  CPU,  despite  the  fact 
that  its  cost  is  only  a  small  fraction  of  the  cost  of  others.  We  have  preliminary  experience  with  newer  2 
GHz  P4  CPU,  which  makes  this  comparison  is  even  more  striking  (P4  SMPs  are  just  emerging  on  the 
market  but  not  yet  widely  available). 

tools  may  be  used  both  as  code-development  instruments  to  make  permanent  changes,  as  well  as  at 
compile  time  to  make  reversible  automatic  changes  in  order  to  ensure  the  portability  of  the  code. 

KPP  of  vertical  mixing:  Relative  to  the  original  publication  (Large,  et  al.,,  1994)  and  subsequent 
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implementation  in  NCAR’s  and  other  ^-coordinate  general  circulation  models,  the  KPP  vertical  mixing 
parameterization  was  modified  in  several  aspects  in  order  to  make  it  suitable  for  the  framework  of 
terrain-following  coordinate  of  ROMS.  This  is  due  to  B-C  grid  difference  and,  more  importantly,  due  to 
the  discretization  algorithms  of  a  sigma-grid  with  their  horizontally  variable  vertical  resolution  and 
concerns  about  hydrostatic  consistency,  which,  unlike  ^-coordinate  models,  do  not  allow  an  arbitrary 
increase  of  vertical  resolution.  Consequently,  the  KPP  scheme  has  to  cope  with  a  coarser  vertical 
resolution,  resulting  in  tougher  requirements  on  the  vertical  discretization  methods.  Thus,  instead  of 
estimating  Richardson  number  via  finite  differences  at  every  grid  point  and  then  interpolating  it  linearly 
to  determine  the  boundary  layer  depth  (where  a  bulk  Richardson  number  crosses  its  critical  value),  the 
problem  is  reformulated  in  terms  of  low-order  polynomial  fits  of  prognostic  variables — velocity  and 
density — which  tend  to  possess  a  smoother  behavior  than  Richardson  number,  resulting  in  more  accurate 
positioning  of  boundary  layer  edge.  This  procedure  alone  significantly  mitigates  the  ’’stepiness”  in 
boundary  layer  deepening,  as  well  as  eliminates  any  necessity  of  spatial  smoothing  of  Richardson 
number  for  numerical  reasons  (a  common  practice  in  ^-coordinate  models).  We  are  now  doing  a  general 
reconsideration  of  the  physical  rules  in  KPP,  specifically  focusing  on  how  high-frequency  surface  forcing 
systematically  increases  the  boundary  layer  depth  when  the  stratification  is  strong. 

IMPACT/APPLICATIONS 

The  validated  technical  innovations  in  our  evolving  model  are  prototypes  for  future  improvements  in 
operational  observing-system,  data-assimilation,  and  prediction  capabilities.  The  scientific  understanding 
of  the  coastal  oceans  is  relevant  to  the  U.S.  Navy’s  missions. 

TRANSITIONS 

One  tangible  measure  of  the  utility  of  our  results  is  that  other  researchers  are  either  using  our  evolving 
ROMS  code  or  adapting  its  algorithms  for  their  own  code.  Current  users  of  our  version  of  ROMS  include 
Chao  and  Li  (NASA/JPL),  Miller  and  Cornuelle  (SIO),  Moisan  (NASA/Wallops),  and  the  Monterey  Bay 
NOPP  SCOPE  team — Chavez  (MBARI),  Chai  (Maine), et  al.,.  Arango  and  Haidvogel  (Rutgers)  have 
adapted  many  features  for  their  version  of  ROMS.  In  the  near  future  we  anticipate  additional  users,  partly 
through  the  ONR-sponsored,  terrain-coordinate  model  development  project  (TOMS).  We  are  contributing 
useful  knowledge  about  coastal  modeling  methodology  and  phenomena  through  published  papers. 

RELATED  PROJECTS 

Our  recent  venture  into  coastal  oceanography  now  extends  into  several  related  projects.  We  began  with  a 
focus  on  the  Southern  California  Bight,  especially  with  regard  to  its  water  quality  [a  California  Sea  Grant 
project].  We  are  just  completing  a  ONR  project  on  developing  the  embedded  gridding  capability  for 
ROMS.  We  have  a  joint  project  with  Chao  [NASA/JPL]  on  using  embedded  grids  in  ROMS  for  studying 
Eastern  and  Western  Boundary  Current  interactions  with  the  North  Pacific  gyres  [NASA].  We  have  a 
project  jointly  with  Moisan  (NASA),  Miller  and  Cornuelle  (SIO),  and  Haidvogel  and  Wilkin  (Rutgers)  to 
model  the  coastal  carbon  cycle  [NASA].  We  are  partners  in  the  NOPP  SCOPE  project  for  developing 
models  and  analyses  for  the  Monterey  National  Marine  Sanctuary.  We  have  also  submitted  a  proposal  to 
ONR  to  participate  in  the  Autonomous  Ocean  Sampling  Network  II  field  experiment  in  summer,  2003. 
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